Markov Decision Processes with Slow Scale Periodic Decisions

نویسندگان

  • Matthew W. Jacobson
  • Nahum Shimkin
  • Adam Shwartz
چکیده

We consider a class of discrete time, dynamic decision-making models which we refer to as Periodically Time-Inhomogeneous Markov Decision Processes (PTMDPs). In these models, the decision-making horizon can be partitioned into intervals, called slow scale cycles, of N +1 epochs. The transition law and reward function are time-homogeneous over the first N epochs of each slow scale cycle, but distinct at the final epoch. The motivation for such models is in applications where decisions of different nature are taken at different time scales, i.e., many “low-level” decisions are made between less frequent “high-level” ones. For the PTMDP model, we consider the problem of optimizing the expected discounted reward when rewards devalue by a discount factor at the beginning of each slow scale cycle. When N is large, initially stationary policies (i.s.p.’s) are natural candidates for optimal policies. Similar to turnpike policies, an initially stationary policy uses the same decision rule for some large number of epochs in each slow scale cycle, followed by a relatively short planning horizon of time-varying decision rules. In this paper, we characterize the form of the optimal value as a function of N , establish conditions ensuring the existence of near-optimal i.s.p.’s, and characterize their structure. Our analysis deals separately with the cases where the time-homogeneous part of the system has state-dependent and state-independent optimal average reward. As we illustrate, the results in these two distinct cases are qualitatively different.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Time-Scale Markov Decision Processes for Organizational Decision-Making

Decision-makers in organizations and other hierarchical systems interact within and across multiple organizational levels and take interdependent actions over time. The challenge is to identify incentive mechanisms that align agents’ interests and to provide these agents with guidance for their decision processes. To this end, we developed a multiscale decision-making model that combines game t...

متن کامل

Supply Chain Coordination using Different Modes of Transportation Considering Stochastic Price-Dependent Demand and Periodic Review Replenishment Policy

In this paper, an incentive scheme based on crashing lead time is proposed to coordinate a supplier-retailer supply chain (SC). In the investigated SC, the supplier applies a lot-for-lot replenishment policy to replenish its stock and determines the replenishment multiplier. Moreover, the transportation lead time is considered under the control of the supplier. The retailer as downstream member...

متن کامل

Multitime scale Markov decision processes

This paper proposes a simple analytical model called M time-scale Markov Decision Process (MMDP) for hierarchically structured sequential decision making processes, where decisions in each level in the M -level hierarchy are made in M different time-scales. In this model, the state space and the control space of each level in the hierarchy are non-overlapping with those of the other levels, res...

متن کامل

Accelerated decomposition techniques for large discounted Markov decision processes

Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...

متن کامل

Multilinear and Integer Programming for Markov Decision Processes with Imprecise Probabilities

Markov Decision Processes (MDPs) are extensively used to encode sequences of decisions with probabilistic effects. Markov Decision Processes with Imprecise Probabilities (MDPIPs) encode sequences of decisions whose effects are modeled using sets of probability distributions. In this paper we examine the computation of Γ-maximin policies for MDPIPs using multilinear and integer programming. We d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 28  شماره 

صفحات  -

تاریخ انتشار 2003